Fault-tolerant sub-lithographic design with rollback recovery.
نویسندگان
چکیده
Shrinking feature sizes and energy levels coupled with high clock rates and decreasing node capacitance lead us into a regime where transient errors in logic cannot be ignored. Consequently, several recent studies have focused on feed-forward spatial redundancy techniques to combat these high transient fault rates. To complement these studies, we analyze fine-grained rollback techniques and show that they can offer lower spatial redundancy factors with no significant impact on system performance for fault rates up to one fault per device per ten million cycles of operation (P(f) = 10(-7)) in systems with 10(12) susceptible devices. Further, we concretely demonstrate these claims on nanowire-based programmable logic arrays. Despite expensive rollback buffers and general-purpose, conservative analysis, we show the area overhead factor of our technique is roughly an order of magnitude lower than a gate level feed-forward redundancy scheme.
منابع مشابه
Transient and Intermittent Fault Recovery without Rollback
Increasing chip density combined with heightened reliability expectations has spawned greater interest in fault tolerant design. In recent years, research into rollback and retry techniques has established them as an e ective approach to recovery from transient and intermittent faults. For applications with strict timing requirements, however, the high error latency inherent in retry approaches...
متن کاملExperimental Assessment of Fault Coverage for Fault-Tolerant High-Performance Processors
⎯ In this paper, we present a comprehensive experimental assessment of fault coverage for a fault-tolerant VLIW processor, which consists of the error detection, error rollback recovery and reconfiguration mechanisms. We implement the proposed design of fault-tolerant VLIW in VHDL and employ the fault injection to investigate the effects of fault duration, workload variation and the number of r...
متن کاملDESIGN AND EVALUNMON OF A FAIAT—TOLTIMN'r MULTIRROCESSOR USING HARDWARE 1110sCOVERY BLOM,
In this paper we consider the design and the evaluation of a fault-tolerant multiprocessor with a rollback recovery mechanism. The rollback mechanism Is based on the hardware recovery block which is a hardware equivalent to the software recovery block. The hardware recovery block is constructed by consecutive state-save operations and several state-save units in every processor and memory modul...
متن کاملOrphan-Free Consistent Condition for Log-Based Checkpointing and Rollback Recovery Scheme
The fundamental goal of the log-based fault-tolerant scheme is to bring the system into a consistent global state without any orphan inconsistence. However, the existing Alvisi’s No-Orphans Consistency Condition is only sufficient on condition that the set of local checkpoints of failure processes keep consistent always. Independent of the specific log-based checkpointing and rollback-recovery ...
متن کاملPhoinix A Fault-Tolerant Object Service in OMA
The Object Management Architecture (OMA) has been recognized as a de facto standard in the development of object services in distributed computing environment. In a distributed system, the provision for failure-recovery is always a vital design issue. However, the fault-tolerant service has not been extensively considered in the current OMA framework, despite the fact that a increasing number o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Nanotechnology
دوره 19 11 شماره
صفحات -
تاریخ انتشار 2008